Skip to content

Conversation

@arichardson
Copy link
Owner

Please take a look and let me know what you think before I send this upstream.

@jrtc27 @resistor @veselypeta @davidchisnall

@resistor
Copy link

resistor commented Oct 1, 2025

I'd like to pre-merge this with Cheriot before sending it upstream, if possible. Otherwise we won't be able to track upstream while I sort out all of the merge & test issues. I can start on that next week, most likely.

let isConstant = true in def X0_Y : RISCVCapReg<X0, "x0", ["zero", "null"]>,
DwarfRegAlias<X0>;
let CostPerUse = [0, 1] in {
def X1_Y : RISCVCapReg<X1, "x1", ["ra"]>, DwarfRegAlias<X1>;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would folks feel about supporting the "c"-prefixed names as a non-standard extension for backwards compatibility?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like this, but wasn't sure upstream would want it. Similarly adding aliases for the mnemonics downstream would make a lot of sense

Copy link

@resistor resistor Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think upstream will be sympathetic to backwards compatibility so long as it does not compromise in the implementation too much. In this case it seems like it would be minimal to support?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If adding it to altnames is sufficient then I imagine that could be fine

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally included support for the original mnemonics but that adds a lot of untested code upstream so probably better to keep that downstream?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If adding it to altnames is sufficient then I imagine that could be fine

That's what I had in mind

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally included support for the original mnemonics but that adds a lot of untested code upstream so probably better to keep that downstream?

I don't think it matters to us too much. Our bincompat is for xcheri, which we will just have to have live in parallel with the new opcodes & mnemonics for now.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the only shared part will be the register info change. I imagine allowing x names may start triggering weird diagnostics until we ignore the "wrong-mode" load/store instructions which currently never match due to the wrong register name. Morello has a IgnoredFeatures tablegen change to ignore them in the matcher and I should probably pull that in once I get to the loads and stores. So it might make sense for you to hold off integrating this into cheriot until I have that.

// fully backwards compatible with non-Y code).
def FeatureCapMode : SubtargetFeature<"cap-mode", "IsCapMode", "true",
"Capability pointer mode">;
def IsCapMode
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We renamed this to XCheriPureCap using RISCVExtension, so that it plays nice with the rest of the RISCV extension infrastructure. That enables us to do things like have XCheriot automatically imply XCheriPureCap.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though given the direction of the Y standard, perhaps we should invert the sense of the feature bit?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on this part?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can't you have XCheriot imply the CapMode subtargetfeature?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It likely shouldn't have a x prefix since that is for vendor-extensions and the standard will have this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm can't you have XCheriot imply the CapMode subtargetfeature?

We could do that in code, but the generic infrastructure for RV features implying other RV features only works for things declared as standard extensions or vendor extensions, and it then enforces the naming scheme.

Copy link

@resistor resistor Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to page it all back in now, I think the issue might have been that CapMode can't imply XCheri (or Y in the future) because extensions can imply features, but features can't imply extensions.

Copy link
Owner Author

@arichardson arichardson Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I was not aware that upstream introduced this restriction. I'm pretty sure you used to be able to imply certain non-extension features like CapMode.

@arichardson arichardson force-pushed the 2025-rvy-initial branch 2 times, most recently from 4c33fc1 to d7af9f1 Compare October 7, 2025 07:19
@jrtc27
Copy link

jrtc27 commented Oct 9, 2025

Given we have a new base ISA, I don't see why we need both I/E/Y and int/cap mode? We only have int/cap mode today because both are RVI, but you can just distinguish Y and not-Y, surely?

@arichardson
Copy link
Owner Author

Given we have a new base ISA, I don't see why we need both I/E/Y and int/cap mode? We only have int/cap mode today because both are RVI, but you can just distinguish Y and not-Y, surely?

We still need a feature bit to select I vs Y as the base ISA, this does not exist right now and is covered by capmode. Or are you suggesting we introduce something like BaseIsaI BaseIsaY features instead of the capmode ones?

@arichardson
Copy link
Owner Author

Right now plain loads have no predicates so we would need to add one?

@resistor
Copy link

Also I need to be able to rebase XCheriot on it, which is hindered if cap mode is not distinct from YBase

@jrtc27
Copy link

jrtc27 commented Oct 10, 2025

Given we have a new base ISA, I don't see why we need both I/E/Y and int/cap mode? We only have int/cap mode today because both are RVI, but you can just distinguish Y and not-Y, surely?

We still need a feature bit to select I vs Y as the base ISA, this does not exist right now and is covered by capmode. Or are you suggesting we introduce something like BaseIsaI BaseIsaY features instead of the capmode ones?

i and e are already feature strings (for FeatureStdExtI and FeatureStdExtE). With y (FeatureStdExtY) you can then use those for predicates.

@jrtc27
Copy link

jrtc27 commented Oct 10, 2025

Also I need to be able to rebase XCheriot on it, which is hindered if cap mode is not distinct from YBase

Then you just add your base ISA to the list of ones for the capability mode predicate, like I and E are both part of the integer mode predicate.

@jrtc27
Copy link

jrtc27 commented Oct 10, 2025

Right now plain loads have no predicates so we would need to add one?

Yes, but that's true regardless of whether the predicate is cap-mode or i-or-e.

@resistor
Copy link

Also I need to be able to rebase XCheriot on it, which is hindered if cap mode is not distinct from YBase

Then you just add your base ISA to the list of ones for the capability mode predicate, like I and E are both part of the integer mode predicate.

Ah, I didn't realize you were suggesting not having a feature but still retaining a tblgen predicate. I think that would work.

@arichardson
Copy link
Owner Author

Right now plain loads have no predicates so we would need to add one?

Yes, but that's true regardless of whether the predicate is cap-mode or i-or-e.

Yeah I think I misread you statement and we are all in agreement. We just need to figure out what the user-visible API to select the features should be.

Given we have the following four classes of instructions:

Instruction Available Base=I/E Available Base=Y Ext Current Extra Predicate
MUL ✔️ ✔️ StdExtI -
LW ✔️ StdExtI NotCapMode
LW_Y ✔️ StdExtY IsCapMode
YBASER ✔️ ✔️ StdExtY -

we will need the extra predicates for the mode-dependent instructions.
Do you suggest we use something like BaseIsaY instead of the existing capmode name?

XCheriot should be able to do something like

def FeatureVendorXCheriot
    : RISCVExtension<1, 0, "CHERIoT extension", [FeatureStdExtE, FeatureStdExtY, BaseIsaY]>;

@resistor
Copy link

I think the proposal was to treat XCheriot as its own base ISA, and define IsCapMode as explicitly checking for IsStdExtY || IsXCheriot

@arichardson
Copy link
Owner Author

arichardson commented Oct 10, 2025

I think the proposal was to treat XCheriot as its own base ISA, and define IsCapMode as explicitly checking for IsStdExtY || IsXCheriot

Oh I misread the previous ones, you are suggesting having both FeatureStdExtY and IsStdExtY? We still need to handle instructions like ybaser that work for both Y and I mode.

arichardson and others added 7 commits November 16, 2025 17:45
This is the first commit in a series of changes to add initial MC-layer
support for the upcoming Y extension for CHERI.

Specification: https://riscv.github.io/riscv-cheri/
Co-authored-by: Jessica Clarke <[email protected]>
This adds initial features for the base RVY extension,
other extensions such as the hybrid mode will be added later.

Co-authored-by: Jessica Clarke <[email protected]>
Co-authored-by: Alexander Richardson <[email protected]>
Co-authored-by: Petr Vesely <[email protected]>
This adds MC-level support for most of the base Y extension instructions,
restricted to the execution-mode-independent subset. The Y extension
(CHERI for RISC-V) also introduces an execution mode that determines
whether certain register operands use the full extended register or only
the address subset (the current XLEN registers). The instructions that
depend on execution mode (loads/stores/jumps + AUIPC) will be added in
the next commit in this stack of changes.

Co-authored-by: Jessica Clarke <[email protected]>
Co-authored-by: Alexander Richardson <[email protected]>
Co-authored-by: Petr Vesely <[email protected]>
This adds supports for all new RVY loads/stores (capability-wide
versions: ly/sy instructions). Additionally, for RVY (CHERI), loads and
stores are mode-dependent, using either a YLEN register or a XLEN
register as the base.
In the former case loads/stores are authorized by that register,
and in the latter (compatibility mode), the loads/stores keep using an
address but are authorized by the DDC CSR. The assembler mnemonics are
the same in both cases. Prior to the standardization process CHERI
assembly used c-prefixed register names for capabilities, so we had the
following syntax:

lw x4, 0(c3) # capability mode: use new `CLW` instruction
lw x4, 0(x3) # integer mode: use existing `LW` instruction

During the standardization this was changed to keep the same register
name in both modes, so now we have `lw x4, 0(x3)` in both modes.

This allows using the RegClassByHwMode feature to reuse the same MC
instruction but with a different operand type depending on the HwMode.
The downstream fork had duplicated definitions which meant a lot of
switch statements now needed to handle both MCInsts. This approach
using HwMode should be much more maintainable and only introduces a
minor diff compared to what we had downstream.
This will also make it much easier adding support for RVY versions of
other extensions such as vector, since we just need to change out the
`GPRMem` operand with `PtrMem`.
…rser

This ensures the broken Asmparser expansions trigger an assertion error.
Noticed this while adding Y extension support where expanding pseudos
generated instructions that were failing predicates but unlike
instructions generated during codegen the predicate verification function
was not being called here.
…Class

Also fix the missing space in the error message. I notice while changing
RISC-V's loads and stores to use RegClassByHwMode and got a non-descriptive
error when tablegen was parsing the InstAliases.
arichardson pushed a commit that referenced this pull request Nov 17, 2025
## Summary

Fix `FindProcesses` to respect Android's `hidepid=2` security model and
enable name matching for Android apps.

## Problem

1. Called `adb shell pidof` or `adb shell ps` directly, bypassing
Android's process visibility restrictions
2. Name matching failed for Android apps - searched for
`com.example.myapp` but GDB Remote Protocol reports `app_process64`

Android apps fork from Zygote, so `/proc/PID/exe` points to
`app_process64` for all apps. The actual package name is only in
`/proc/PID/cmdline`. The previous implementation applied name filters
without supplementing with cmdline, so searches failed.

## Fix

- Delegate to lldb-server via GDB Remote Protocol (respects `hidepid=2`)
- Get all visible processes, supplement zygote/app_process entries with
cmdline, then apply name matching
- Only fetch cmdline for zygote apps (performance), parallelize with
`xargs -P 8`
- Remove redundant code (GDB Remote Protocol already provides GID/arch)

## Test Results

### Before this fix:

```
(lldb) platform process list
error: no processes were found on the "remote-android" platform

(lldb) platform process list -n com.example.hellojni
1 matching process was found on "remote-android"
PID    PARENT USER       TRIPLE                         NAME
====== ====== ========== ============================== ============================
5276   359    u0_a192                                   com.example.hellojni
                         ^^^^^^^^ Missing triple!
```

### After this fix:

```
(lldb) platform process list
PID    PARENT USER       TRIPLE                         NAME
====== ====== ========== ============================== ============================
1      0      root       aarch64-unknown-linux-android  init
2      0      root                                      [kthreadd]
359    1      system     aarch64-unknown-linux-android  app_process64
5276   359    u0_a192    aarch64-unknown-linux-android  com.example.hellojni
5357   5355   u0_a192    aarch64-unknown-linux-android  sh
5377   5370   u0_a192    aarch64-unknown-linux-android  lldb-server
                          ^^^^^^^^ User-space processes now have triples!

(lldb) platform process list -n com.example.hellojni
1 matching process was found on "remote-android"
PID    PARENT USER       TRIPLE                         NAME
====== ====== ========== ============================== ============================
5276   359    u0_a192    aarch64-unknown-linux-android  com.example.hellojni


(lldb) process attach -n com.example.hellojni
Process 5276 stopped
* thread #1, name = 'example.hellojni', stop reason = signal SIGSTOP
```

## Test Plan

With an Android device/emulator connected:

1. Start lldb-server on device:
```bash
adb push lldb-server /data/local/tmp/
adb shell chmod +x /data/local/tmp/lldb-server
adb shell /data/local/tmp/lldb-server platform  --listen 127.0.0.1:9500 --server
```

2. Connect from LLDB:
```
(lldb) platform select remote-android
(lldb) platform connect connect://127.0.0.1:9500
(lldb) platform process list
```

3. Verify:
- `platform process list` returns all processes with triple information
- `platform process list -n com.example.app` finds Android apps by
package name
- `process attach -n com.example.app` successfully attaches to Android
apps

## Impact

Restores `platform process list` on Android with architecture
information and package name lookup. All name matching modes now work
correctly.

Fixes llvm#164192
arichardson pushed a commit that referenced this pull request Nov 17, 2025
…am (llvm#167724)

This got exposed by `09262656f32ab3f2e1d82e5342ba37eecac52522`.

The underlying stream of `m_os` is referenced by the `TextDiagnostic`
member of `TextDiagnosticPrinter`. It got turned into a
`llvm::formatted_raw_ostream` in the commit above. When
`~TextDiagnosticPrinter` (and thus `~TextDiagnostic`) is invoked, we now
call `~formatted_raw_ostream`, which tries to access the underlying
stream. But `m_os` was already deleted because it is earlier in the
order of destruction in `TextDiagnosticPrinter`. Move the `m_os` member
before the `TextDiagnosticPrinter` to avoid a use-after-free.

Drive-by:
* Also move the `m_output` member which the `m_os` holds a reference to.
The fact it's a reference indicates the expectation is most likely that
the string outlives the stream.

The ASAN macOS bot is currently failing with this:
```
08:15:39  =================================================================
08:15:39  ==61103==ERROR: AddressSanitizer: heap-use-after-free on address 0x60600012cf40 at pc 0x00012140d304 bp 0x00016eecc850 sp 0x00016eecc848
08:15:39  READ of size 8 at 0x60600012cf40 thread T0
08:15:39      #0 0x00012140d300 in llvm::formatted_raw_ostream::releaseStream() FormattedStream.h:205
08:15:39      #1 0x00012140d3a4 in llvm::formatted_raw_ostream::~formatted_raw_ostream() FormattedStream.h:145
08:15:39      llvm#2 0x00012604abf8 in clang::TextDiagnostic::~TextDiagnostic() TextDiagnostic.cpp:721
08:15:39      llvm#3 0x00012605dc80 in clang::TextDiagnosticPrinter::~TextDiagnosticPrinter() TextDiagnosticPrinter.cpp:30
08:15:39      llvm#4 0x00012605dd5c in clang::TextDiagnosticPrinter::~TextDiagnosticPrinter() TextDiagnosticPrinter.cpp:27
08:15:39      llvm#5 0x0001231fb210 in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47
08:15:39      llvm#6 0x0001231fb3bc in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47
08:15:39      llvm#7 0x000129aa9d70 in clang::DiagnosticsEngine::~DiagnosticsEngine() Diagnostic.cpp:91
08:15:39      llvm#8 0x0001230436b8 in llvm::RefCountedBase<clang::DiagnosticsEngine>::Release() const IntrusiveRefCntPtr.h:103
08:15:39      llvm#9 0x0001231fe6c8 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93
08:15:39      llvm#10 0x0001231fe858 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93
...
08:15:39
08:15:39  0x60600012cf40 is located 32 bytes inside of 56-byte region [0x60600012cf20,0x60600012cf58)
08:15:39  freed by thread T0 here:
08:15:39      #0 0x0001018abb88 in _ZdlPv+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4bb88)
08:15:39      #1 0x0001231fb1c0 in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47
08:15:39      llvm#2 0x0001231fb3bc in (anonymous namespace)::StoringDiagnosticConsumer::~StoringDiagnosticConsumer() ClangModulesDeclVendor.cpp:47
08:15:39      llvm#3 0x000129aa9d70 in clang::DiagnosticsEngine::~DiagnosticsEngine() Diagnostic.cpp:91
08:15:39      llvm#4 0x0001230436b8 in llvm::RefCountedBase<clang::DiagnosticsEngine>::Release() const IntrusiveRefCntPtr.h:103
08:15:39      llvm#5 0x0001231fe6c8 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93
08:15:39      llvm#6 0x0001231fe858 in (anonymous namespace)::ClangModulesDeclVendorImpl::~ClangModulesDeclVendorImpl() ClangModulesDeclVendor.cpp:93
...
08:15:39
08:15:39  previously allocated by thread T0 here:
08:15:39      #0 0x0001018ab760 in _Znwm+0x74 (libclang_rt.asan_osx_dynamic.dylib:arm64e+0x4b760)
08:15:39      #1 0x0001231f8dec in lldb_private::ClangModulesDeclVendor::Create(lldb_private::Target&) ClangModulesDeclVendor.cpp:732
08:15:39      llvm#2 0x00012320af58 in lldb_private::ClangPersistentVariables::GetClangModulesDeclVendor() ClangPersistentVariables.cpp:124
08:15:39      llvm#3 0x0001232111f0 in lldb_private::ClangUserExpression::PrepareForParsing(lldb_private::DiagnosticManager&, lldb_private::ExecutionContext&, bool) ClangUserExpression.cpp:536
08:15:39      llvm#4 0x000123213790 in lldb_private::ClangUserExpression::Parse(lldb_private::DiagnosticManager&, lldb_private::ExecutionContext&, lldb_private::ExecutionPolicy, bool, bool) ClangUserExpression.cpp:647
08:15:39      llvm#5 0x00012032b258 in lldb_private::UserExpression::Evaluate(lldb_private::ExecutionContext&, lldb_private::EvaluateExpressionOptions const&, llvm::StringRef, llvm::StringRef, std::__1::shared_ptr<lldb_private::ValueObject>&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, lldb_private::ValueObject*) UserExpression.cpp:280
08:15:39      llvm#6 0x000120724010 in lldb_private::Target::EvaluateExpression(llvm::StringRef, lldb_private::ExecutionContextScope*, std::__1::shared_ptr<lldb_private::ValueObject>&, lldb_private::EvaluateExpressionOptions const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>*, lldb_private::ValueObject*) Target.cpp:2905
08:15:39      llvm#7 0x00011fc7bde0 in lldb::SBTarget::EvaluateExpression(char const*, lldb::SBExpressionOptions const&) SBTarget.cpp:2305
08:15:39  ==61103==ABORTING
...
```
arichardson pushed a commit that referenced this pull request Nov 17, 2025
@arichardson arichardson changed the base branch from upstream-basline to master November 17, 2025 01:48
@arichardson arichardson changed the base branch from master to upstream-basline November 17, 2025 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants